Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016
نویسندگان
چکیده
This paper presents our Chinese-to-Japanese patent machine translation system for WAT 2016 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese. In this year’s system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an attention mechanism. Our pre-ordering showed a significant improvement over the phrase-based baseline, but, in contrast, it degraded the neural machine translation baseline.
منابع مشابه
Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering forWAT 2015
This paper presents our Chinese-toJapanese patent machine translation system for WAT 2015 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. A head word and its modifier words are reordered by hand-written rules or a learning-to-rank model. Our system outperforms baseline phrase-based machine translations and competes with baseline tree-to-string machine transl...
متن کاملSystem Description: Dependency-based Pre-ordering for Japanese-Chinese Machine Translation
This paper describes the Beijing Jiaotong University Japanese-Chinese machine translation system which participated in the 1st Workshop on Asian Translation (WAT 2014). We propose a preordering approach based on dependency parsing for Japanese-Chinese statistical machine translation (SMT). Our system achieves a BLEU of 24.12 and a RIBES of 79.48 on the Japanese-Chinese translation task in the o...
متن کاملImproving Patent Translation using Bilingual Term Extraction and Re-tokenization for Chinese-Japanese
Unlike European languages, many Asian languages like Chinese and Japanese do not have typographic boundaries in written system. Word segmentation (tokenization) that break sentences down into individual words (tokens) is normally treated as the first step for machine translation (MT). For Chinese and Japanese, different rules and segmentation tools lead different segmentation results in differe...
متن کاملThe SAS Statistical Machine Translation System for WAT 2014
This paper is a description of the techniques and experiment results by SAS Institute Inc in WAT 2014 evaluation campaign. We participate in two subtasks of WAT 2014: the Chinese to Japanese track and the English to Japanese track. Our baseline system is MOSES statistical machine translation toolkit. We propose syntactic reordering approaches for English to Japanese and Chinese to Japanese tran...
متن کاملOtedama: Fast Rule-Based Pre-Ordering for Machine Translation
We present Otedama,1 a fast, open-source tool for rule-based syntactic pre-ordering, a well established technique in statistical machine translation. Otedama implements both a learner for pre-ordering rules, as well as a component for applying these rules to parsed sentences. Our system is compatible with several external parsers and capable of accommodating many source and all target languages...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016